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I t Abstract 

^ Approximate Bayesian computation (ABC) is a class of algorithmic 

methods in Bayesian inference using statistical summaries and computer 
simulations. ABC has become popular in evolutionary genetics and in 
other branches of biology. However model selection under ABC algorithms 
^ has been a subject of intense debate during the recent years. Here we 

propose novel approaches to model selection based on posterior predictive 
\C distributions and approximations of the deviance. We argue that this 

O^l framework can settle some contradictions between the computation of 

model probabilities and posterior predictive checks using ABC posterior 
distributions. A simulation study and an analysis of a resequencing data 
set of human DNA show that the deviance criteria lead to sensible results 
in a number of model choice problems of interest to population geneticists. 
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Introduction 

Approximate Bayesian Computation (ABC) is a class of Monte- Carlo algorithms 
for parameter inference based on summary statistics instead of the full data 
(Beaumont et al 2002, Marjoram et al 2006, Beaumont 2010). More specif- 
ically, ABC algorithms use simulations from a stochastic model to generate 
random samples from an approximation of the posterior distribution of a mul- 
tidimensional parameter, 6, after reduction of the original data, yo, into a set of 
summary statistics, sq = s{yo). Here we will consider that sq are the only data 
available to ABC analyses, and will refer to (Robert et al 2011) for discussions 
related to the sufficiency of the summary statistics. ABC methods found their 
origin in evolutionary genetics (Pritchard et al 1999, Tavarc 2004), where they 
have been fruitfully applied to the inference of demographic history of several 
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species (Lopes and Beaumont 2009, Csillcry et al 2010, Beaumont 2010). Ex- 
amples of analyses encompass the evaluation of alternative scenarios of human 
evolution (Fagundes et al 2007, Patin et al 2009, Laval et al 2010), inference in 
demographic models of population expansion, bottleneck or migration (Thorn- 
ton et al 2006, Frangois et al 2008), population structure and adaptation (Bazin 
et al 2010). 

Current ABC algorithms fall into broad subclasses of methods that extend 
the standard subclasses of computational algorithms used in Bayesian statistics. 
The first class of algorithms makes use of the rejection algorithm to accept 
parameters generating simulated data close to the observations (Pritchard et 
al 1999). The rejection algorithm performs the following steps: 1) Generate a 
candidate value 9 from a prior distribution; 2) Simulate a data set, y, from a 
generating mechanism using the parameter 6, and compute the set of summary 
statistics s = s(y); 3) Accept the value of 9 if the (Euclidean) distance between 
s and So is less than e, a prespecified error value; 4) If rejected, go to 1). For 
this basic algorithm, the accepted values {9i) form a random sample from an 
approximation of the posterior distribution. 

The above approximation becomes exact as e goes to zero, but the algorithm 
is then highly inefficient. Recent techniques improve the approximation of the 
posterior distribution by applying linear or non-linear transforms (Beaumont 
et al 2002, Wegmann et al 2009, Leuenberger and Wegmann 2010, Blum and 
Francois 2010). In those improvements, the accepted values of the parameter, 
9i, are weighted by a quantity that depends on the distance between Sj and 
So- Then they are adjusted according to a regression transform, for example, 
9* = 9i — b'^{si — So), where 6 is a vector of linear regression coefficients (Beau- 
mont et al 2002). Several studies have provided evidence that the transformed 
parameters form a significantly better approximation of the posterior distribu- 
tion than the non-transformed ones (Beaumont et al 2002, Blum and Frangois 
2010), and regression adjustments are now widely used by ABC practioners 
(Thornton 2009, Cornuet et al 2009, Lopes and Beaumont 2009). Two other 
classes of algorithms implement Markov chain Monte Carlo methods without 
likelihood (Marjoram et al 2003, Bortot et al 2007) and iterative algorithms 
that were originally inspired by sequential Monte Carlo samplers (Sisson et al 
2007, Beaumont et al 2009, Toni et al 2009). 

An important aspect of ABC is its use for model selection in addition to 
parameter estimation. In general the aim of model selection is to find models 
receiving the highest posterior probabilities among a finite subset of candidates. 
Bayesian statisticians have devised numerous ways to evaluate and select mod- 
els for inference (Gelman et al 2004). Assuming that there are M models under 
consideration, the Bayesian paradigm includes model selection in the inference 
step, taking the model label as an additional parameter, m. In decision theo- 
retic approaches, model choice is performed on the basis of posterior probabili- 
ties, p(m|so), which are proportional to the marginal probabilities, p{sQ\m). In 
ABC these probabilities can be crudely estimated by counting simulations from 
model m that fall at a distance less than a fixed value to the observed data. 
More sophisticated estimators of posterior model probabilities can be found in 



(Beaumont 2008) or in (Leuenberger and Wegmann 2009). Alternatively se- 
quential Monte-Carlo algorithms can also used to estimate model probabilities 
via iterated importance sampling procedures (Toni and Stumpf 2010). 

Model selection using ABC algorithms has been recently questioned (Tem- 
pleton 2009, Beaumont et al 2010, Csillery et al 2010, Robert et al 2011). Here 
we point out a potentially serious concern when selecting models on the ba- 
sis of approximate posterior model probabilities. Because approximate model 
probability estimates are based on the rejection algorithm and ignore regression 
adjustments on parameter samples, we argue that model choice based on these 
probabilities does not apply to the (approximate) models in which we eventually 
make inference. To see this, assume 0\m = 6m, and let 



be the approximation of the posterior distribution obtained from the rejection 
algorithm, where p{9m) denotes the prior distribution on the parameter 0^. for 
model m. The joint distribution defining model m is then equal to 



Regression adjustments replace Pc(0m|so) with another distribution pi-og(^m|so)i 
which is generally closer to the exact posterior distribution. Clearly this change 
modifies the joint distribution in equation ([2]). Thus a model chosen on the basis 
of Pc(m|so) can be different from the model in which we eventually estimate 
parameter uncertainty. 

In the next section, we define two information theoretic criteria for model 
selection based on measures of model fit penalized by an estimate of the model 
complexity. While our focus in on regression methods, the ideas introduced in 
the present study apply to any ABC algorithm. The approach shares similarities 
with the popular Akaike information criterion (AIC, Akaike 1974) which is valid 
for the comparison of nested models (Burnham et al 2002, Johnson and Om- 
land 2004, Ripley 2004, Carsten et al 2009). The assumption of nested models 
is seldom appropriate to ABC, and we develop a statistical theory of approx- 
imate deviance information criteria (DIC), a generalization of AIC that does 
not require the assumption of nested models (Spiegelhalter et al 2002, Gelman 
et al 2004). Then we provide an example of ABC analysis where model choice 
based on approximate probabilities disagree with the prediction of adjusted 
models and DICs. Using simulations, we study the relevance of the proposed 
information criteria to inference in population genetics under various models of 
demographic history and population structure. In the last part of the study, we 
present an application to an empirical genetic data set of 20 noncoding DNA 
regions resequenced from 213 humans (Laval et al 2010), and we use DICs to 
question the replacement of Neanderthals by modern humans. 



Pe{Ora\sa) « Pr(||s - Sq\\ < e\9,m)p{0m) , 
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Pe{6„i, so\m) = Pe{6„i\so)pe{so\m) . 



(2) 



Theory 



In this section we describe model selection criteria based on posterior predictive 
distributions and approximations of the deviance. 

Information theoretic criteria In Bayesian analyses, the deviance informa- 
tion criterion summarizes the fit of a model by the posterior expectation of the 
deviance, D, and the complexity of a model by its effective number of parame- 
ters, pd (Spiegelhalter et al 2002). The models that receive the highest support 
from the data arc those with the lowest values of the DIG. More specifically, the 
definition of DIG is 

mG = D+pD, (3) 

where the deviance is minus twice the logarithm of the likelihood, D{9) = 
— 21ogp(so|^), D is the expected deviance 

D = Ee\,,[D{e)], (4) 

and pd is the difference between D and the deviance evaluated at a particular 
point estimate, D{d). An example of 9 often used in applications is the estimate 
of the posterior mean of the model parameter. 

A complication arises when models are defined hierarchically. In hierarchical 
models there is a hidden parameter and the posterior distribution decomposes 
as follows 

p{e, ip\so) oc p{so\(p)p{ip\0)p{e) . (5) 

In this situation, several definitions of the deviance and DIG have been proposed 
depending on the focus of the model (Spiegelhalter et al 2002; Geleux et al 2006). 
For example focusing on 6, the deviance can be taken equal to 

D{e) = -2 log (^J p{so\ip)p{md'fi^ , (6) 

and the computation of DIG should be modified accordingly. 

A hierarchical model approach to ABC Without regression adjustments, 
one way to define ABG is as a hierarchical Bayesian model in which the simulated 
summary statistics are viewed as latent variables. In this hierarchical model, 
the posterior distribution decomposes as 

p{e,s\so)<xp{so\s)p{s\e)p{e) (?) 

where s are the simulated statistics, and 6 becomes the "hyper-prior" parameter. 

In this model, p{s\9) is the probability of the generating the summary statis- 
tics, s, with parameter 9. To make use of a hierarchical framework, we define a 
surrogate likelihood function 

p(so|s) ^ K,{so -s) = -K f , (8) 



where K is a density function, called the kernel, and e is the error parameter. 
The distribution j5(so|s) can be viewed as a model for the observation error, and 
ABC performs exact inference under the assumption of model error (Wilkinson 
2008). The hierarchical model reformulation of ABC dates to the work of Mar- 
joram et al (2003) who used it as a rationale for defining MCMC algorithms 
without likelihood. This point of view has also proven useful in a variety of the- 
oretical works on ABC (Bortot et al 2007, Ratmann et al 2009, Wilkinson 2008). 
Regarding the basic rejection algorithm, the definition amounts to choosing a 
uniform density function over the interval (0, 1) for the kernel. In this case, we 
obtain 

p{so\0) « J^p{so\s)p{s\e)ds = Pr(||so - s|| < e\e) . (9) 

Extensions of the rejection algorithm use non-uniform kernels. For example, 
Beaumont et al (2002) implemented the Epanetchnikov function which is popu- 
lar in density estimation. Because we want to relate the quantity log K^{so — s) 
to a natural measure of model fit, we take the Gaussian kernel 

e-"'/2 , u€R. (10) 

With this choice, the quantity —2\ogKi{so — s) has a natural interpretation as 
the sum of squares error between observed and simulated statistics. 

The surrogate model presented above has a two-level hierarchy. Following 
Spiegelhalter et al (2002) or Celeux et al (2006), distinct definitions of DIC 
can be proposed, depending on whether the focus is on the fit of the summary 
statistics to the observed ones or on the model parameters themselves. Focusing 
on the parameter level allows us to better evaluate the predictive power of the 
fitted models, and we next introduce two definitions for an approximate deviance 
at this level. 

A first way to define a Bayesian deviance is by considering a posterior pre- 
dictive average of a "low level" deviance 

dev(e) = -2E,|4logp(so|s)] = -2 [ logp{so\s)p{s\e)ds . (11) 

J S 

In this case, the expected Bayesian deviance is 

= Ee|,Jdev(^)] = -2E3|,,[logp(so|s)] 

With this definition, a Monte-Carlo estimate of the expected deviance can be 
easily computed from the simulated data as follows 

2 " 

— Vlog(i^,(s^ -so)) , (12) 
n ^-^ 




where the s-' are summary statistics obtained from the posterior predictive dis- 
tribution p(s|so). To compute the penalty p]j, we generate n summary statistics 



s-' from p{s\0), where ^ is a point estimate of 6, for example an estimate of the 
posterior mean, E[0|so]. Applying the same formula as above, we come with an 
estimate D^{9) that we use to define p\, 

p], = D'-D\e). (13) 

Though the focus is on the parameter 9, the previous definition of a deviance 
is not equivalent to equation (|6|. A definition of the deviance in a hierarchical 
model consistent with this equation is as follows 

D{e) = -21ogp(so|0) = -2 log (^J^p{so\s)p{s\e)ds^ , (14) 

which is also equal to 

Die) = -2 log {E,ie[Ke{s - sq)]) • (15) 

With this definition, an estimate of the expected deviance requires two levels of 
Monte Carlo integration 

m / 1 " \ 

where we have m replicates, (0i)i=i,...,m) from the approximate posterior distri- 
bution, p{9\so), and each is sampled from p{s\9i), j — 1, . . . ,n. To compute 
p^, we generate n summary statistics, s-', from the conditional distribution 
p{s\9), where ^ is a point estimate of 9, and we set 



D'{9) ^-2log\ -y^K,{s^ -so) I . (17) 
Then we define 

pI = D'-D\9). (18) 

Both definitions of D and pu lead to distinct definitions of an information 
criterion, DICi = ^P^di * ~ 1; 2. DIC2 has the advantage of defining DIG for 
ABC models more rigorously than DICi, but it has the disadvantage of being 
computationally more intensive. 

Unlike model probabilities, D and pu can be computed from any approxi- 
mation of the posterior distribution. Using linear or non-linear regression ad- 
jusments, we can consider the transformed parameter posterior distribution, 
Preg(^'m|so): instead of ^£(6*^150)- To compute DIG, we then replace the 9iS by 
their adjusted values 0*'s, sampled from the modified posterior distribution, and 
generate posterior predictive densities from these values. In the sequel, DIGi 
will refer to predictive distributions generated from adjusted parameters. 

To motivate the use of information criteria and illustrate some of the issues 
presented in the introduction, we consider an example where samples of size 



n = 20 are simulated from a Gaussian disribution of mean /io = 2 and standard 
deviation ctq = 3 (then assumed to be unknown). The data are summarized 
by their empirical mean, standard deviation, skewness and kurtosis, and the 
sample size is known. 

We observed Sq = (2.00,3.11,-0.78,0.14). For these data two models are 
hypothesized. The sampling distribution of first model is a Gaussian distribution 
where the parameter, = (/i,(T^), corresponds to the mean and the variance. 
The prior distribution on is a Gaussian distribution of mean 2 and standard 
deviation 10. The prior distribution on a is an inverse-exponential distribution 
of rate 1. The sampling distribution of the second model is a Laplace distribution 
of mean 3 and rate A. The prior distribution on A is an exponential distribution 
of rate 1. 

To perform ABC analyses, we simulated 10,000 samples from each model, 
and pooled the 20,000 vectors of statistics into a single data set. Using an ac- 
ceptance rate of 10%, we estimated model probabilities using the R package 
abc (R core team 2010). This package computes the proportion of accepted 
simulations under both models, and also implements the weighted logistic re- 
gression method of (Beaumont 2008). The Bayes factor, defined as the ratio of 
marginal probabilities for two models mi and m2 can be estimated as the ratio 
of counts in favor of nii and m2 (Pritchard et al 1999, Grelaud et al 2009). The 
proportion of accepted simulations from model 2 (Laplace) was 0.83. Assuming 
a uniform prior distribution on models 1 and 2 wc obtained an approximation 
of the Bayes factor equal to BF « 5.02. The logistic regression estimate for the 
posterior probability of model 2 was 0.85, and we obtained BF w 5.75. Accord- 
ing to Jeffrey's scale on Bayes factors (Jeffreys 1961), there would be substantial 
evidence in favor of the Laplace model over the Gaussian model. 

In a second stage, we performed regression adjustments to the approximate 
posterior samples. The observed value of the kurtosis statistic was outside the 
tails of the posterior predictive distribution under the Laplace model. In con- 
trast it was within the tails of the posterior predictive distribution under the 
Gaussian model. Thus there is an apparent contradiction between the com- 
putation of model probabilities and the predictions from the posterior distri- 
bution. To better understand this contradiction, we drew the exact posterior 
distribution of cr^ under the Gaussian model (an inverse-Gamma(ll, 1 -I- 9.5fo) 
distribution, where Vq is the empirical variance). Although posterior density 
approximations are improved by the adjustment method (Figure 1), the esti- 
mates of model probabilities did not account for such improvements. Then we 
computed DICs for the Gaussian and Laplace models. Under the Gaussian 
model, we obtained DICi = 4.5 and DIC2 = 3.2. Under the Laplace model, we 
obtained DICi = 10.1 and DIC2 = 4.5. Once the corrections were applied, DIG 
indicated that the Gaussian model was a better choice than the Laplace model. 

To investigate whether our analysis was robust, we replicated it 100 times 
with values of sq sampled from the same Gaussian distribution in each replicate. 
The average value of the Bayes factor was around 3.45 (3.58 when using the 
logistic regression method) in favor of the Laplace model which obtained the 
highest posterior probability in 100% of the replicates. In contrast, the Gaussian 
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Figure 1: Posterior distributions of parameter a'^ (Gaussian model). A) The 
exact posterior distribution. B) The approximation obtained from the rejection 
algorithm. C) The approximation obtained with the regression adjustement. 



model was preferred in 100% of the replicates when DIG was used. The mean 
value of the information criterion was around 4.1 (2.4 for DIC2) under the 
Gaussian model, whereas is was around 12.4 (3.7 for DIC2) under the Laplace 
model. 



Population genetic models 

For each simulated data set and each model considered afterwards, we performed 
10,000 simulations under specified prior distributions (100,000 for the human 
data). We ran ABC analyses using an acceptance rate of 10% (1% for the human 
data). To compute DICi we additionally created n =1,000 replicates from the 
posterior predictive distribution. For DIC2, we used m = 200 and n = 200 
replicates. 

Demographic models. We generated coalescent simulations under selec- 
tively neutral models of micro-evolution for 3 distinct demographic scenarios: 
a sudden decline in population size (bottleneck without recovery), a constant 
population size, an exponentially growing population size;. In these data, fifty 
diploid individuals were genotyped at 20 non recombining loci. The data were 
simulated as DNA sequences under an infinitely many-sites model using the 
computer program ms (Hudson 2002). 

For each of the three models, we simulated one hundred replicates of the 
data with fixed parameter values. In ms, the parameters are expressed in units 
of the current population size. Nq. In our simulations, the normalized mutation 
rate was equal to 6 = 2iVo/i = 3. In the bottleneck model, the population size 
shrunk to a fraction x = 1/4 of its ancestral value, and this event occurred 
t = Q.2N0 generations in the past. In the expanding population model, the 
expansion rate was set to a = 2. 

The prior distribution on the mutation rate, 9, was uniformly distributed 
over (0, 15) for all models. In the bottleneck model, the date of the bottleneck 
event (in unit of A^o) was uniformly distributed over (0,1), and log^Qa; was 





Bottleneck 


Constant size 


Expansion 


Bottleneck* 


13.66 (7.72) 
10.08 (8.80) 


12.83 (7.99) 
10.59 (6.08) 


18.36 (9.82) 
17.04 (9.71) 


Constant size 


40.06 (10.93) 
30.24 (14.71) 


2.83 (0.18) 
2.71 (0.17) 


3.15 (0.30) 
2.95 (0.39) 


Expansion 


17.48 (4.59) 
11.40 (4.12) 


3.65 (0.64) 
3.58 (0.69) 


3.36 (0.23) 
2.87 (0.17) 



Table 1: Comparisons of dem,ographic m,odels using DICi and DIC2- The rows 
correspond to the models used for simulating the data, and the columns corre- 
spond to the models under which inference was performed. The values represent 
the mean and standard deviation of DIC's computed over 100 independent repli- 
cates for each model. Values under the bottleneck model were computed with 
the median of the posterior deviance instead of their mean, because the median 
is less sensitive to large deviations. 

uniformly distributed over (0,1.5). In the expansion model log^g 
uniformly distributed over (0,1.5). ABC analyses were performed using the 
following summary statistics: the Tajima's estimator tt, computed as the mean 
number of differences between pairs of sequences, Tajima's D (Tajima 1989), 
and Fay and Wu's H statistic (Fay and Wu 2000). The three statistics were 
averaged over the 20 loci. 

We applied ABC analyses and deviance information criteria to population 
genetic data simulated under a bottleneck, a constant population size and an 
expansion model (100 data sets for each demographic model). Table 1 reports 
congruent results for DICi and DIC2, and Figure 2 reports the outcome of model 
selection for the three models. 

Although DICi is generally greater than DIC2, the two measures produced 
highly correlated results (Figure 3). In these examples, both criteria agreed in 
their evaluation of models. When the data were simulated under the bottle- 
neck model, reported estimates were obtained with the median of the posterior 
deviance instead of their mean (The mean provided highly variable results). 
The preferred model was the bottleneck model in 61/100 replicates. When the 
data were simulated under the constant population size model, the preferred 
model was the constant size model in 81/100 replicates. When the data were 
simulated under the expanding population size model, the preferred model was 
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Figure 2: Demographic scenarios. Model choice using the deviance information 

criterion, DIC2, when a specific scenario is assumed. See the text for the defini- 
tion of models. The frequencies were computed over 100 independent replicates 
from each model. The same results were obtained with DICi. 

the expanding population model in 89/100 replicates. Overall, models that did 
not generate the data were selected in a small but non-neglectible number of 
replicates (23%). For data simulated under the bottleneck model, the lower 
performances can be explained by the relatively recent date of bottleneck event 
and by the choice of diffuse prior distributions. Both factors contribute to the 
difficulty of distinguishing between a bottleneck and a constant size model. In 
addition there is great variability in the simulated summary statistics, and this 
can explain why the bottleneck and constant population size models were diffi- 
cult to tease apart. 

To gain insight on model choice, we examined the results obtained for a 
particular replicate in further details. The replicate was obtained from the 
bottleneck model, where the effective mutation rate, the time of the bottleneck 
event and the severity of the bottleneck 1/x were set to the values (3, .2, 4). The 
observed summary statistics were equal to tt = 8.58, £) = 1.32 and H = 9.42. 
Under the bottleneck model, the point estimates of the effective mutation rate 
(posterior mean = 2.90, .95 CI = [1.34, 4.36]), the time of the bottleneck event 
(posterior mean = 0.35, .95 CI = [0.04, 0.62]) and the severity of the bottleneck 
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Figure 3: Correlation between DICi and DIC2. A) Inference under a population 
expansion model for data simulated under a constant population size model. B) 
Inference under a constant population size model for data simulated under an 
expanding population model. 



(posterior mean = 5.01, .95 CI — [2.18, 47.81]) were close to the values used 
in the simulation. DIC2 was equal to 14.13 (median estimate), whereas it was 
equal to 17.83 under the constant population size model. In this case the DIG 
slightly favored the model that generated the data. Under the bottleneck model, 
the observed values of the summary statistics were indeed within the tails of 
the posterior predictive distributions whereas they were outside the tails in 
the constant size population model (Figure 4). We also investigated why in 
some case DIG failed to select the model from which the data were simulated. 
For one bottleneck data set, the observed summary statistics were equal to 
TT = 6.77, D = 0.56 and \ogH = 1.71. DIG2 was equal to 10.11 (median 
estimate) under the bottleneck model, whereas it was equal to 6.08 under the 
constant population size model. Under both models, the observed values of the 
summary statistics were within the tails of the posterior predictive distributions, 
but the posterior distributions had larger tails under the bottleneck model than 
under the constant size model, and DIG favored the most parcimonious model 
(constant population size). This shows that in some cases the data do not 
contain enough information to discriminate between models, and highlights the 
difficulty of making model inference under coalescent models. 

Isolation with migration. Next we generated coalescent simulations under 
3 distinct models of population structure: divergence of two subpopulations, di- 
vergence with unidirectional gene flow from subpopulation 1 to subpopulation 
2, and divergence with migration between two subpopulations. Fifty diploid 
individuals in each population were genotyped at 100 non recombining loci. 
For each of the three models, we simulated one hundred replicates of the data 
with fixed parameter values. For these models, the parameters are expressed in 
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Figure 4: Posterior predictive distributions ofn, Tajima's D and the logarithm 
of Fay and Wu's H. The observed summary statistics were simulated under 
a bottleneck model, and were equal to tt = 9.77, D = 1.40 and H = 9.98. 
Inferences were performed under a bottleneck model and a constant population 
size model. The vertical bars correspond to the observed summary statistics. 
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Isolation 


2.97 (.15) 


3.37 (.25) 


4.67 (.71) 




2.67 (.12) 


3.18 (.39) 


4.15 (.81) 


Asymetric 


3.13 (.25) 


2.79 (.19) 


4.10 (.31) 


Migration 


2.84 (.23) 


2.65 (.20) 


3.41 (.33) 


Migration 


3.02 (.12) 


3.02 (.20) 


3.82 (.25) 




2.73 (.11) 


2.85 (.19) 


3.25 (.20) 



Table 2: Comparaison of isolation with migration m,odels. The rows correspond 
to the models used for simulating the data, and the columns correspond to the 
models under which inference was performed. The values represent the mean and 
standard deviation of DICi and DIC2 computed over 100 independent replicates 
for each model. 

units of the current subpopulation size, Nq. In our simulations, the normalized 
mutation rate was equal to 9 = 2NofJ, = 4. In each model, the two subpopu- 
lations had equal population size (both were equal to iVo). For the divergence 
model, the split occurred 0.7Nq generations ago. In the divergence with mi- 
gration model, the normalized migration rates were equal to miNo = 40 and 
m2No = 30, whereas in the unidirectional gene flow model, we took rriiNo = 40 
and m^NQ = 0. 

When running ABC inferences, subpopulations sizes were parameterized as 
UiNq and respectively, and we used uniform prior distributions on (0,3) 

both for vi and v^. In all models, the divergence time was uniformly dis- 
tributed over (0,A^o)- Normalized migration rates were uniformly distributed 
over (0,100). In addition to the three statistics tt, D and H used in demo- 
graphic models, we also computed an Fst statistic for each data set according 
to Hudson's estimator (Hudson et al 1992). The statistics were averaged over 
the 100 loci. 

We applied ABC analyses to 300 data sets simulated under the above models 
of population structure and gene flow between two subpopulations. Table 2 
reports congruent results for DICi and DIC2, and Figure 5 describes the results 
of model selection for the three models of population structure. 

As for the simulations of demographic scenarios, DICi and DIC2 produced 
highly correlated results, and agreed in their evaluation of models. When the 
data were simulated under the divergence model, the preferred model was the 
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Figure 5: Population structure and migration scenarios (100 replicates). Model 

choice using the deviance information criterion, DICi, when a particular sce- 
nario is assumed. See text for the definition of models. The frequencies were 
computed over 100 independent replicates from each model. 

divergence model in 98/100 replicates. When the data were simulated under a 
model with imidirectional migration, the preferred model was the model with 
asymetric migration in 92/100 replicates. When the data were simulated under 
the migration model, DIG favored cither a divergence model (49/100 replicates) 
or a model with asymetric migration (51/100), but it never chose the model 
model that generated the data. As previously, DICs favored explanatory mod- 
els that required the smallest number of parameters. A typical example of 
posterior predictive checks for data simulated under the bi-directional migra- 
tion model is displayed in Figure 6. For the 3 models, the observed summary 
statistics were between the tails of their posterior predictive distributions. The 
summary statistics used in this example were not informative enough to distin- 
guish between gene flow and divergence (Nielsen and Wakeley 2001, Hey and 
Machado 2003). The fact that DIG did not select the model that generated the 
data should not be considered as an error. The correct interpretation is that 
the three models were equally good at reproducing the observed statistics, and 
DIG values confirmed that we could not make any strong decision in this case. 



Divergence model (DIC = 2.24) 
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Asymetric migration model (DIC = 2.26) 
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Figure 6; Posterior predictive checks for three isolation with migration mod- 
els. Data were simulated under the bidirectional migration model (third model). 
The summary statistics were within the tails of the posterior predictive distribu- 
tions under all models. The vertical bars correspond to the observed summary 
statistics. 



Human data analysis 



In this section, wc used the deviance information criterion to question the re- 
placement of Neanderthals by modern human using modern human DNA. Ge- 
netic data based on resequencing from noncoding regions were utilized to dis- 
criminate between models of divergence incorporating various levels of admix- 
ture between the two species (Laval ct al 2010). We used sequence variation 
surveyed in DNA samples from 213 healthy donors. The panel included 118 
sub-Saharan African individuals, 47 European individuals, and 48 East-Asian 
individuals. Following Laval et al (2010), we re-analyzed 20 autosomal regions 
(27 kb per individual, mean sequence length per region of 1.33 kb) that met cri- 
teria of selective neutrality. The regions were selected to be independent from 
each other, residing at least 200 kb apart from any known or predicted gene or 
spliced expressed sequence tag, and not to in linkage disequilibrium with any 
known or predicted gene. 

To run an ABC analysis, we used a coalescent-based algorithm implemented 
in SIMCOAL 2 (Laval et al 2004), we generated synthetic data that consisted 
of 20 independent DNA sequences of 1.4 kb each. The mutation and the recom- 
bination rates of each region were drawn from gamma distributions (Table 3). 
The evolutionary scenarios assumed an early diffusion of archaic hominids out 
of Africa ^1.25 and ^2.25 million years ago and an African exodus of modern 
humans between ~40, 000- 100,000 years ago (Fagundes et al 2007). By tuning 
the replacement rate, S, we considered various levels of introgression of archaic 
genetic material into the modern human gene pool. Nineteen models differing 
in their prior distributions on 5 were considered. In these models, each prior 
distribution was uniform over an interval of length 0.1, from < 5 < 0.1 to 
0.9 < 5 < 1. Each interval was deduced from the previous one by a translation 
oih = 0.05. Our objective was to discriminate among models with low, medium 
or high levels of admixture. 

Under equilibrium assinnptions, the human effective population size has been 
estimated at ~10,000 individuals on the basis of human-chimpanzee divergence 
and intra-specific linkage disequilibrium levels (Harpending et al 1998). To give 
population size a degree of freedom and to match with a consensus estimate 
of human populations, we defined a gamma prior distribution with a mean of 
10,000 individuals and a .95 confidence interval of 3,000 to 21,000 individuals 
(Table 3). In all models we considered a constant size for the African, Asian 
and European modern populations (Laval ct al 2010). 

For each genomic region, we specifically computed global and pairwise i^sTi 
based on haplotype frequencies, the number of haplotypes, K, the number of 
polymorphisms, S, the nucleotide diversity, tt, and Tajimas D, the expected 
heterozygosity. We also computed the variance between regions for tt and D. 
The summary statistics were calculated by merging all population samples (ex- 
cept for population differentiation indices) in order to minimize the effects of 
recent demographic events related to the continental populations. 

Considering 19 models with distinct prior distributions for the replacement 
rate of Neanderthals by modern humans, we found a decreasing trend both in the 
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Table 3: Description of the prior distributions of simulated parameters. U and 
G denote Uniformly and Gamma distributed distributions, ND (for not drawn). 
Times are expressed in number of years (generation times of 25 years). The 
modern migration rates, m, is the proportion of migrants after the Out-of- Africa 
exodus. The mutation rate, jj,, is expressed in per generation per base, and the 
recombination rate, p, is expressed in per generation per pair of adjacent bases. 

expected deviance Di and in DICi- For high values of the replacement rate, 5, 
D\ and DICi were close to each other (Figure 7). These indices favored models 

exhibiting high values of the parameter 5 and low levels of genetic introgression 
of the human genetic pool by Neanderthal genes (Green et al 2010). 

Conclusion 

Model selection using ABC algorithms is notoriously difficult (Csillery et al 
2010b), and it has been the topic of an intense debate in evolutionary genetics 
(Templeton 2009, Beaumont et al 2010, Robert et al 2011). As we have shown 
in this study, some approximations of model probabilities computed by counting 
replicates from each model falling at a distance smaller than a given value, e, to 
the observed data can potentially lead to systematic errors in ABC, especially 
when e is not small. Using small acceptance rates implies, however, that gigantic 
numbers of simulations are performed, especially when more than ten summary 
statistics are used. 

Our purpose here was not to argue that any model choice previously per- 
formed in ABC studies on the basis of approximate model probabilities is unreli- 
able. Indeed, inconsistencies in model predictions may still be detected by using 
standard posterior predictive model checking procedures. In addition we do not 
argue that regression adjustments are problematic, and the results obtained 
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Figure 7: Replacement of Neanderthals by modern humans. Expected Deviance, 
Di, and DICi for models with different prior distributions on the replacement 
rate, S. The prior distributions are uniform over intervals of length 0.1. Each 
interval is deduced by translation of h = 0.05 from the previous one. 



under our simulation models should motivate users to apply these corrections 
systematically. 

Regression adjustments can correct the posterior values obtained from the 
rejection algorithm, but they have no effect on posterior model probabilities. 
To overcome this potential issue, we argue that model selection should be done 
for the statistical distributions that correspond to the transformed model, and 
we propose an approach based on the evaluation of posterior predictive quanti- 
ties. Our solution is based on the formulation of an approximate deviance, and 
bypasses the estimation of posterior model probabilities. Our simulation study 
showed that the concepts of approximate deviance provide reasonable answers 
to the model choice issue in the population genetic examples tested, which are 
representative of this field (Beaumont 2010, Nielsen and Wakeley 2001). Finally 
we emphasize that since the computation of DIG is based on posterior predictive 
distributions, this approach applies to any type of ABC algorithm. 
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